Introduction
- Protein differential expression analysis (DEA) for
DIANN, FragPipe DDA, FragPipe TMT, MaxQuant outputs, or MSstats inputs.
- Uses preprocessing and statistical models implemented in the
R package prolfqua
doi.org/10.1021/acs.jproteome.2c00441
- Generates dynamic HTML reports
- Exports results as XLSX files,
.rnk and .txt files for GSEA and ORA
- Archived analysis can easily be replicate on any system running
R (>= 4.1)
How To
Install R and prolfquapp
install.packages('remotes')
remotes::install_github('wolski/prolfquapp', dependencies = TRUE)
Create a directory with :
- config.yaml (parameter file)
- dataset.csv (experimental design)
- the FASTA file
- DIANN, FragPipe or MaxQuant results
Copy the R code into the working directory by running one of the functions:

The content of the working directory is:

Finally, from R console source("FP_DIA.R"),
or execute Rscript FP_DIA.R. This
creates a subfolder with the DEA results.

- DE_Groups_vs_Controls.html report describing the main steps of the analysis and shows the results.
- DE_Groups_vs_Controls.xlsx contains the raw and transformed abundances, annotations, results of the differential expression analysis.
.rnk, and .txt files for GSEA and ORA analysis
- Diagnostic plots for each proteins (boxplots, lineplots for peptide abundances)
The entire working directory including input data, R code and results is archived. You can unzip it later and replicate the analysis using your R installation.
Analysis parameters
The config.yaml file specifies the parameters of the analysis:
- project related information e.g. projectID, is shown in the HTML report
- aggregation method
(medpolish, rlm, top_3)
- abundance transformation
(robscale, vsn, none),
- FDR and effect size thresholds

Sample annotation
The dataset.csv file contains the information about the measured samples:
- Relative.Path/Path/raw.file/channel/ (unique)
- name - used in plots and figures (unique)
- group/experiment - main factor
- subject/bioreplicate (optional) - blocking factor
- control - used to specify the control condition (C) (optional)
The column names are not case sensitive.

If subject is specified then the model is abundance ~ group + subject, otherwise
abundance ~ group. The group differences to compute are determined from the group and control columns. MSstats anntotation.csv and dataset.csv are similar.
HTML Report
- Project related information (project ID etc)
- Primary introduction to DEA
- Sums up the design of the experiment
- Summarizes of protein ident. and quant.:
missigness, CV, clustering, PCA
- DEA results with volcano plots and tables (they interact using
crosslink)
- Explains output formats, gives pointers to follow up analysis (GSEA, ORA)
Summary
- Integrates into LIMS system
doi.org/10.1515/jib-2022-0031
- Archived working directory contains the results and all the data needed to replicate analysis
on your PC
- User-friendly data formats (XLSX, txt, rnk)
